Skip to content

refactor: implement cooperative state machine for range/list operations#18204

Merged
drmingdrmer merged 1 commit intodatabendlabs:mainfrom
drmingdrmer:315-bb
Jun 21, 2025
Merged

refactor: implement cooperative state machine for range/list operations#18204
drmingdrmer merged 1 commit intodatabendlabs:mainfrom
drmingdrmer:315-bb

Conversation

@drmingdrmer
Copy link
Copy Markdown
Member

@drmingdrmer drmingdrmer commented Jun 20, 2025

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

refactor: implement cooperative state machine for range/list operations

Fix blocking issue during initialization data transmission.

Problem:

When establishing a watch stream, the meta-service sends large amounts of
initialization data to the client. During this transmission, other events
are blocked until completion, including add-watcher commands.

This creates a deadlock: if initialization data is large, it blocks all
subsequent Dispatcher operations. When a second watch request arrives,
it must wait for the first one to complete sending all initialization data.
Since adding a new watcher requires holding the state machine lock,
multiple concurrent watch requests will block the state machine entirely,
causing timeouts for other requests.

Solution:

Make the process cooperative by not waiting for watch stream transmission
to complete. Instead, queue the add-watcher command and return immediately.
This allows subsequent watch requests to proceed without waiting for previous
initialization data transmissions to finish.

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Refactoring

Related Issues


This change is Reviewable

@github-actions github-actions Bot added the pr-refactor this PR changes the code base without new features or bugfix label Jun 20, 2025
@drmingdrmer
Copy link
Copy Markdown
Member Author

This should fix the 4s timeout when meta-service initialize a watch stream with large bunch of initialization data. @everpcpc @bohutang

@drmingdrmer drmingdrmer requested review from bohutang and everpcpc June 20, 2025 14:16
@drmingdrmer drmingdrmer marked this pull request as ready for review June 20, 2025 14:16
Copy link
Copy Markdown
Member

@bohutang bohutang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Fix blocking issue during initialization data transmission.

Problem:

When establishing a watch stream, the meta-service sends large amounts of
initialization data to the client. During this transmission, other events
are blocked until completion, including add-watcher commands.

This creates a deadlock: if initialization data is large, it blocks all
subsequent Dispatcher operations. When a second watch request arrives,
it must wait for the first one to complete sending all initialization data.
Since adding a new watcher requires holding the state machine lock,
multiple concurrent watch requests will block the state machine entirely,
causing timeouts for other requests.

Solution:

Make the process cooperative by not waiting for watch stream transmission
to complete. Instead, queue the add-watcher command and return immediately.
This allows subsequent watch requests to proceed without waiting for previous
initialization data transmissions to finish.
@drmingdrmer drmingdrmer merged commit 0224108 into databendlabs:main Jun 21, 2025
148 of 150 checks passed
@drmingdrmer drmingdrmer deleted the 315-bb branch June 21, 2025 02:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-refactor this PR changes the code base without new features or bugfix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants